Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 1155220160410010135
Journal of the Korean Society of Health Information and Health Statistics
2016 Volume.41 No. 1 p.135 ~ p.146
Comparison of Various Classification Tree Methods with Clinical Data
Shin Hye-Jung

Lee Yoon-Dong
Lee Eun-Kyung
Abstract
Objectives: A classification tree is one of the statistical tools that is widely used in the data mining field. It is useful for making statistical decisions, for example, in medical, biology, and business management area. In this paper, we examine newly developed classification tree algorithms and compare them with real examples in medical study, and provide a guideline to select appropriate methods for data analysis.

Methods: For the comparison, we used four clinical datasets from UCI (University of California, Irvine) repository. We divide each data to 2/3 training and 1/3 test data set. After fitting the models with various R packages (tree, rpart, party, evtree, CORElearn and randomForest), misclassification rates for training data and test data are calculated separately. Also, specificity and sensitivity are calculated for test data. This procedure is repeated 200 times and compare misclassification rates with one-way analysis of variance and Tukey¡¯s honest significant difference (HSD). Also, specificities and sensitivities are compared.

Results: In every case, randomForest shows the best performance. For the single tree methods, the performance of methods is different in each data set. evtree show better performance than the other methods in most data sets. Most sensitivities in Breast Tissue and Dermatology data are quite large. rpart and ctree show very low specificity in Dermatology Data.

Conclusions: Every method has its own characteristic and the performance depends on data. Our study shows that the best single tree methods are different in four example data and evtree shows slightly better performance than the other single tree methods in most data sets. randomForest always shows the best performance, mainly because of using a lot of trees instead of one tree.
KEYWORD
Classification, Tree-structured model, Clinical data analysis
FullTexts / Linksout information
Listed journal information
ÇмúÁøÈïÀç´Ü(KCI)